The combinatorics of tandem duplication trees.
نویسندگان
چکیده
We developed a recurrence relation that counts the number of tandem duplication trees (either rooted or unrooted) that are consistent with a set of n tandemly repeated sequences generated under the standard unequal recombination (or crossover) model of tandem duplications. The number of rooted duplication trees is exactly twice the number of unrooted trees, which means that on average only two positions for a root on a duplication tree are possible. Using the recurrence, we tabulated these numbers for small values of n. We also developed an asymptotic formula that for large n provides estimates for these numbers. These numbers give a priori probabilities for phylogenies of the repeated sequences to be duplication trees. This work extends earlier studies where exhaustive counts of the numbers for small n were obtained. One application showed the significance of finding that most maximum-parsimony trees constructed from repeat sequences from human immunoglobins and T-cell receptors were tandem duplication trees. Those findings provided strong support to the proposed mechanisms of tandem gene duplication. The recurrence relation also suggests efficient algorithms to recognize duplication trees and to generate random duplication trees for simulation. We present a linear-time recognition algorithm.
منابع مشابه
running head: COUNTING DUPLICATION TREES The Combinatorics of Tandem Duplication Trees
We develop a recurrence relation that counts the number of Tandem Duplication Trees (either rooted or unrooted) that are consistent with a set of n tandemly repeated sequences generated under the standard unequal recombination (or crossover) model of tandem duplications. We find that the number of rooted duplication trees is exactly twice the number of unrooted trees, which means, on average, o...
متن کاملThe combinatorics of tandem duplication
Tandem duplication is an evolutionary process whereby a segment of DNA is replicated and proximally inserted. The different configurations that can arise from this process give rise to some interesting combinatorial questions. Firstly, we introduce an algebraic formalism to represent this process as a word producing automaton. The number of words arising from n tandem duplications can then be r...
متن کاملLETTER On Counting Tandem Duplication Trees
Large genomes are full of repeated DNA sequences. It was estimated that over half of the human DNA consists of repeated sequences (Baltimore 2001; Eichler 2001; Leem et al. 2002). Tandem duplication is one of the important evolutionary mechanisms for producing repeated DNA sequences, in which the copies that may or may not contain genes are adjacent along the genome. Fitch (1977) first observed...
متن کاملReconstructing the duplication history of tandemly repeated genes.
We present a novel approach to deal with the problem of reconstructing the duplication history of tandemly repeated genes that are supposed to have arisen from unequal recombination. We first describe the mathematical model of evolution by tandem duplication and introduce duplication histories and duplication trees. We then provide a simple recursive algorithm which determines whether or not a ...
متن کاملAn efficient and accurate distance based algorithm to reconstruct tandem duplication trees
UNLABELLED The problem of reconstructing the duplication tree of a set of tandemly repeated sequences which are supposed to have arisen through unequal recombination, was first introduced by Fitch (1977, Genetics, 86, 93-104), and has recently received a lot of attention. In this paper, we describe DTSCORE, a fast distance based algorithm to reconstruct tandem duplication trees, which is statis...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Systematic biology
دوره 52 1 شماره
صفحات -
تاریخ انتشار 2003